Goto

Collaborating Authors

 continuous-time representation


Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks

Neural Information Processing Systems

We propose a novel memory cell for recurrent neural networks that dynamically maintains information across long windows of time using relatively few resources. The Legendre Memory Unit~(LMU) is mathematically derived to orthogonalize its continuous-time history -- doing so by solving $d$ coupled ordinary differential equations~(ODEs), whose phase space linearly maps onto sliding windows of time via the Legendre polynomials up to degree $d - 1$. Backpropagation across LMUs outperforms equivalently-sized LSTMs on a chaotic time-series prediction task, improves memory capacity by two orders of magnitude, and significantly reduces training and inference times. LMUs can efficiently handle temporal dependencies spanning $100\text{,}000$ time-steps, converge rapidly, and use few internal state-variables to learn complex functions spanning long windows of time -- exceeding state-of-the-art performance among RNNs on permuted sequential MNIST. These results are due to the network's disposition to learn scale-invariant features independently of step size. Backpropagation through the ODE solver allows each layer to adapt its internal time-step, enabling the network to learn task-relevant time-scales. We demonstrate that LMU memory cells can be implemented using $m$ recurrently-connected Poisson spiking neurons, $\mathcal{O}( m)$ time and memory, with error scaling as $\mathcal{O}( d / \sqrt{m})$. We discuss implementations of LMUs on analog and digital neuromorphic hardware.


Reviews: Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks

Neural Information Processing Systems

Originality: the use use of the Legendre polynomial seems rather creative, it was certainly important to define RNNs with good models of coupled linear units. Quality: The set of benchmarks is well chosen to describe a broad scope of qualities that RNN require. One non-artificial task would have been a plus though. What would have been even more important is to support the theory by controlling the importance of the initialization of the Matrices A and B. What if A was initialized with a clever diagonal (for instance the diagonal of A_bar)? As the architecture is already rather close to the one of NRU, one may wonder whether the architecture is not doing most of the job.


Reviews: Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks

Neural Information Processing Systems

This paper proposes a new memory layout for recurrent neural networks that is 1. theoretically grounded 2. allows for orders of magnitude longer memory than traditional approaches with comparable parameter cost The results are also confirmed experimentally. This work is definitely of interest to Neurips community and would be a great contribution to the conference.


Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks

Neural Information Processing Systems

We propose a novel memory cell for recurrent neural networks that dynamically maintains information across long windows of time using relatively few resources. The Legendre Memory Unit (LMU) is mathematically derived to orthogonalize its continuous-time history -- doing so by solving d coupled ordinary differential equations (ODEs), whose phase space linearly maps onto sliding windows of time via the Legendre polynomials up to degree d - 1 . Backpropagation across LMUs outperforms equivalently-sized LSTMs on a chaotic time-series prediction task, improves memory capacity by two orders of magnitude, and significantly reduces training and inference times. LMUs can efficiently handle temporal dependencies spanning 100\text{,}000 time-steps, converge rapidly, and use few internal state-variables to learn complex functions spanning long windows of time -- exceeding state-of-the-art performance among RNNs on permuted sequential MNIST. These results are due to the network's disposition to learn scale-invariant features independently of step size.


Rough Transformers for Continuous and Efficient Time-Series Modelling

Moreno-Pino, Fernando, Arroyo, Álvaro, Waldon, Harrison, Dong, Xiaowen, Cartea, Álvaro

arXiv.org Machine Learning

Time-series data in real-world medical settings typically exhibit long-range dependencies and are observed at non-uniform intervals. In such contexts, traditional sequence-based recurrent models struggle. To overcome this, researchers replace recurrent architectures with Neural ODE-based models to model irregularly sampled data and use Transformer-based architectures to account for long-range dependencies. Despite the success of these two approaches, both incur very high computational costs for input sequences of moderate lengths and greater. To mitigate this, we introduce the Rough Transformer, a variation of the Transformer model which operates on continuous-time representations of input sequences and incurs significantly reduced computational costs, critical for addressing long-range dependencies common in medical contexts. In particular, we propose multi-view signature attention, which uses path signatures to augment vanilla attention and to capture both local and global dependencies in input data, while remaining robust to changes in the sequence length and sampling frequency. We find that Rough Transformers consistently outperform their vanilla attention counterparts while obtaining the benefits of Neural ODE-based models using a fraction of the computational time and memory resources on synthetic and real-world time-series tasks.


Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks

Voelker, Aaron, Kajić, Ivana, Eliasmith, Chris

Neural Information Processing Systems

We propose a novel memory cell for recurrent neural networks that dynamically maintains information across long windows of time using relatively few resources. The Legendre Memory Unit (LMU) is mathematically derived to orthogonalize its continuous-time history -- doing so by solving $d$ coupled ordinary differential equations (ODEs), whose phase space linearly maps onto sliding windows of time via the Legendre polynomials up to degree $d - 1$. Backpropagation across LMUs outperforms equivalently-sized LSTMs on a chaotic time-series prediction task, improves memory capacity by two orders of magnitude, and significantly reduces training and inference times. LMUs can efficiently handle temporal dependencies spanning $100\text{,}000$ time-steps, converge rapidly, and use few internal state-variables to learn complex functions spanning long windows of time -- exceeding state-of-the-art performance among RNNs on permuted sequential MNIST. These results are due to the network's disposition to learn scale-invariant features independently of step size.


r/MachineLearning - [R] Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks (NeurIPS2019 Spotlight)

#artificialintelligence

We propose a novel memory cell for recurrent neural networks that dynamically maintains information across long windows of time using relatively few resources. The Legendre Memory Unit (LMU) is mathematically derived to orthogonalize its continuous-time history--doing so by solving d coupled ordinary differential equations (ODEs), whose phase space linearly maps onto sliding windows of time via the Legendre polynomials up to degree d 1. Backpropagation across LMUs outperforms equivalently-sized LSTMs on a chaotic time-series prediction task, improves memory capacity by two orders of magnitude, and significantly reduces training and inference times. LMUs can efficiently handle temporal dependencies spanning 100,000 time-steps, converge rapidly, and use few internal state-variables to learn complex functions spanning long windows of time--exceeding state-of-the-art performance among RNNs on permuted sequential MNIST. These results are due to the network's disposition to learn scale-invariant features independently of step size. Backpropagation through the ODE solver allows each layer to adapt its internal time-step, enabling the network to learn task-relevant time-scales.